Designing for errors: similarities and differences of disfluency rates and prosodic characteristics across domains
نویسندگان
چکیده
This paper focuses on some characteristics of disfluencies in human-human (HHI) and human-computer (HCI) interaction corpora to outline similarities and differences. The main variables studied are disfluency rates and prosodic features. Structured, table-like input increases the disfluency rate in HCI and decreases it in HHI. Direct exposure (visibility) to the interface also increases the rate and gives speech a unique prosodic pattern of hyperarticulation. In most of the studied corpora, silences at the disfluency site are not predicted by syntactic rules. Similarities between HCI and HHI exist mainly in the prosodic realizations of the reparandum and the repair. The findings contribute to better understanding and modeling of disfluencies. Speech-based interfaces need to focus on communication types that are well-understood and prone to good modeling.
منابع مشابه
Which Words Are Hard to Recognize? Prosodic, Lexical, and Disfluency Factors that Increase ASR Error Rates
Many factors are thought to increase the chances of misrecognizing a word in ASR, including low frequency, nearby disfluencies, short duration, and being at the start of a turn. However, few of these factors have been formally examined. This paper analyzes a variety of lexical, prosodic, and disfluency factors to determine which are likely to increase ASR error rates. Findings include the follo...
متن کاملWhich words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates
Despite years of speech recognition research, little is known about which words tend to be misrecognized and why. Previous work has shown that errors increase for infrequent words, short words, and very loud or fast speech, but many other presumed causes of error (e.g., nearby disfluencies, turn-initial words, phonetic neighborhood density) have never been carefully tested. The reasons for the ...
متن کاملProduction of English Lexical Stress by Persian EFL Learners
This study examines the phonetic properties of lexical stress in English produced by Persian speakers learning English as a foreign language. The four most reliable phonetic correlates of English lexical stress, namely fundamental frequency, duration, intensity, and vowel quality were measured across Persian speakers’ production of the stressed and unstressed syllables of five English disyllabi...
متن کاملPreliminaries to a Theory of Speech
This thesis examines disfluencies (e.g., “um”, repeated words, and a variety of forms of self-repair) in the spontaneous speech of adult normal speakers of American English. Despite their prevalence, disfluencies have traditionally been viewed as irregular events and have received little attention. The goal of the thesis is to provide evidence that, on the contrary, disfluencies show remarkably...
متن کاملطراحی و ارزیابی یک مدل بازسازی گفتار به روش همگذاری واحدهای حساس به بافت نوایی
This paper describes the design and evaluation of prosodically-sensitive concatenative units for a Persian text-to-speech (TTS) synthesis system. Thesyllables used are prosodically conditioned in the sense that a single conventional syllable is stored as different versions taken directly from the different prosodic domains of the prosodically labeled, read sentences. The three levels of the Per...
متن کامل